This is going to be a simple report. I am going to build the networks, visualize them, and outline some of their general features. Then, I am going to try out some initial analyses on some of the static networks. The main goal is to see how the networks’ capacity to transmit culture changes as we remove nodes. Here, I will use random removal and will examine how this affects two things: the global efficiency of the network and how quickly information diffuses across it. I am going to keep all the code visible so that everything I do is transparent.
I am going to begin looking at the elephant data. We have three networks represented through association matrices. However, the matrices reflect distance and, thus, 1 signals that the elephants were never close, while 0 denotes maximum association.
Let’s begin by loading the data and building a network from it.
# Packages
library(tidyverse)
library(igraph)
library(patchwork)
library(brainGraph)
library(bayestestR)
# Import the data
elephant_data <- read_csv("Data/dist.matrix.t1.csv")
## Warning: Missing column names filled in: 'X1' [1]
## Parsed with column specification:
## cols(
## .default = col_double(),
## X1 = col_character()
## )
## See spec(...) for full column specifications.
# Transform data into square matrix
elephant_matrix <- elephant_data %>%
select(2:ncol(elephant_data)) %>%
as.matrix()
# Change the names so that nodes have the same ID as in the dataset
node_IDs <- names(elephant_data)[2:ncol(elephant_data)]
colnames(elephant_matrix) <- node_IDs
rownames(elephant_matrix) <- node_IDs
Given that we have an “inverse” association matrix, we need to do some algebra to create the network. What I am going to do is create a matrix of equal dimensions populated with 1s and then I will subtract our elephant matrix from it.
# Build inverse matrix
inv_elephant_matrix <- matrix(1, 97, 97) - elephant_matrix
# Populate the diagonal with 0s
diag(inv_elephant_matrix) <- 0
# Now create the network
elephant_graph_inv <-
graph_from_adjacency_matrix(inv_elephant_matrix, mode = "undirected", weighted = T)
# Let's see the first three nodes
inv_elephant_matrix[1:3,1:3]
## R1 R2 R3
## R1 0.000000 0.9610390 0.9740260
## R2 0.961039 0.0000000 0.9866667
## R3 0.974026 0.9866667 0.0000000
Here’s an example of our matrix. We notice pretty strong associations between the first three elephants. Let’s look at some descriptive features of our network.
# Get the degree distribution of the graph
tibble(degree = degree(elephant_graph_inv)) %>%
ggplot(aes(x = degree)) +
geom_histogram(bins = 50) +
theme_bw() +
labs(y = "",
title = "Degree Distribution")
# Get the distribution of weights
tibble(weight = E(elephant_graph_inv)$weight) %>%
ggplot(aes(x = weight)) +
geom_histogram(bins = 50) +
theme_bw() +
labs(y = "",
title = "Weight Distribution")
The degree distribution is centered around relatively high values, meaning that most agents have at least some association to the others. But the distribution of weights shows that the bulk of those associations are tenuous. The strong associations we saw above are very much the exception and not the rule.
Now, I am going to plot the network. I’ll present the network as is but I will also impose a threshold on what constitutes a tie. The second plot will represent the network if we only count the associations that are bigger than the average weight. Additionally, we have information about kinship ties in our matrix. I’ll incorporate that into our visualizations: red edges will represent kinship ties.
# Write a function for plotting
# It accepts three arguments m = matrix, t = threshold, c = caption
plot_with_threshold <- function(m, t, c) {
# Copy of our matrix
copy_mat <- m
# Replace values below threshold with 0
copy_mat[copy_mat < t] <- 0
# Now create the network
graph <-
graph_from_adjacency_matrix(copy_mat, mode = "undirected", weighted = T)
# Build an edgelist to find family ties
edgelist <- get.data.frame(graph)
# See which lines fullfil the requirements for kinship ties
kinship_ties <- rep(NA, nrow(edgelist))
for (i in 1:nrow(edgelist)) {
pat_from <- edgelist$from[i]
if (str_detect(edgelist$to[i], "\\.") != TRUE) {
kinship_ties[i] <- 0
} else {
pat_to <- sub("\\..*", "", edgelist$to[i])
if (pat_from==pat_to) {
kinship_ties[i] <- 1
} else {
kinship_ties[i] <- 0
}
}
}
edgelist$kinship <- kinship_ties
# Color by family tie
E(graph)$color <- ifelse(kinship_ties==1, "red", "grey")
plot(graph, layout = layout.fruchterman.reingold,
vertex.label = "", vertex.size = 2, vertex.color = "white", edge.width = E(graph)$weight, main = paste0(c, "\n", "Threshold = ", t))
}
plot_with_threshold(m = inv_elephant_matrix, t = 0,
c = "Elephants Wave 1")
plot_with_threshold(m = inv_elephant_matrix, t = round(mean(inv_elephant_matrix), 3),
c = "Elephants Wave 1")
We see a densely connected network that seems to have distinct cliques. When we impose the threshold on what constitutes a tie, we do see more clearly these small clusters. These well-connected cliques, loosely tied to each other, resonates with what we see in other elephant networks.
Let’s inspect the second wave. I am going to produce similar descriptive plots and network visualizations
# Import the data
elephant_data_w2 <- read_csv("Data/dist.matrix.t2.csv")
## Warning: Missing column names filled in: 'X1' [1]
## Parsed with column specification:
## cols(
## .default = col_double(),
## X1 = col_character()
## )
## See spec(...) for full column specifications.
# Transform data into square matrix
elephant_matrix_w2 <- elephant_data_w2 %>%
select(2:ncol(elephant_data_w2)) %>%
as.matrix()
# Change the names so that nodes have the same ID as in the dataset
node_IDs_w2 <- names(elephant_data_w2)[2:ncol(elephant_data_w2)]
colnames(elephant_matrix_w2) <- node_IDs_w2
rownames(elephant_matrix_w2) <- node_IDs_w2
# Build inverse matrix
inv_w2<- matrix(1, 130, 130) - elephant_matrix_w2
# Populate the diagonal with 0s
diag(inv_w2) <- 0
elephant_graph_w2 <- graph_from_adjacency_matrix(inv_w2, mode = "undirected", weighted = T)
# Get the degree distribution of the graph
tibble(degree = degree(elephant_graph_w2)) %>%
ggplot(aes(x = degree)) +
geom_histogram(bins = 50) +
theme_bw() +
labs(y = "",
title = "Degree Distribution",
subtitle = "Wave 2")
# Get the distribution of weights
tibble(weight = E(elephant_graph_w2)$weight) %>%
ggplot(aes(x = weight)) +
geom_histogram(bins = 50) +
theme_bw() +
labs(y = "",
title = "Weight Distribution",
subtitle = "Wave 2")
## Warning: Removed 54 rows containing non-finite values (stat_bin).
We have a degree distribution centered around a higher value but also a longer left tail, meaning there are quite a few nodes that are not that well connected. Again the distribution of weights suggests most connections are rather weak.
plot_with_threshold(m = inv_w2,
t = 0,
c = "Elephants Wave 2")
plot_with_threshold(m = inv_w2,
t = round(mean(inv_w2, na.rm = T), 3),
c = "Elephants Wave 2")
We see a similar pattern as above: a densely connected network whose cliques become clear when we impose some threshold on the ties. This network, however, is more densely connected than the one from the first wave.
Let’s finish this initial exploration with wave 3.
# Read in the data
elephant_data_w3 <- read_csv("Data/dist.matrix.t3.csv")
## Warning: Missing column names filled in: 'X1' [1]
## Parsed with column specification:
## cols(
## .default = col_double(),
## X1 = col_character()
## )
## See spec(...) for full column specifications.
# Transform data into square matrix
elephant_matrix_w3 <- elephant_data_w3 %>%
select(2:ncol(elephant_data_w3)) %>%
as.matrix()
# Change the names so that nodes have the same ID as in the dataset
node_IDs_w3 <- names(elephant_data_w3)[2:ncol(elephant_data_w3)]
colnames(elephant_matrix_w3) <- node_IDs_w3
rownames(elephant_matrix_w3) <- node_IDs_w3
# Build inverse matrix
inv_w3 <- matrix(1, 120, 120) - elephant_matrix_w3
# Populate the diagonal with 0s
diag(inv_w3) <- 0
elephant_graph_w3 <- graph_from_adjacency_matrix(inv_w3, mode = "undirected", weighted = T)
# Get the degree distribution of the graph
tibble(degree = degree(elephant_graph_w3)) %>%
ggplot(aes(x = degree)) +
geom_histogram(bins = 50) +
theme_bw() +
labs(y = "",
title = "Degree Distribution",
subtitle = "Wave 3")
# Get the distribution of weights
tibble(weight = E(elephant_graph_w3)$weight) %>%
ggplot(aes(x = weight)) +
geom_histogram(bins = 50) +
theme_bw() +
labs(y = "",
title = "Weight Distribution",
subtitle = "Wave 3")
## Warning: Removed 88 rows containing non-finite values (stat_bin).
A similar pattern: a densely connected networks with mostly weak ties.
We can visualize the network in the same way we have been doing so far.
plot_with_threshold(m = inv_w3,
t = 0,
c = "Elephants Wave 3")
plot_with_threshold(m = inv_w3,
t = round(mean(inv_w3, na.rm = T), 3),
c = "Elephants Wave 3")
We see a similar structure than the one for wave 2 but it seems to be even more densely connected. We notice a higher proportion of relatively strong ties when we impose the threshold.
Now, I am going to look at the dolphin data. I am going to do the same for the networks we have here. We still have association indices here but in the form of an edgelist. This will make the creation of the networks a bit more straightforward. We also have a relatedness coefficient here that I want to work into the visualizations. Ties that have above average relatedness will be colored red.
# Import the data
dolphin_edge_lists <- read_csv("Data/dolphin_edge_lists.csv")
## Parsed with column specification:
## cols(
## ID1 = col_character(),
## ID2 = col_character(),
## T2008 = col_double(),
## T2010 = col_double(),
## T2012 = col_double(),
## T2014 = col_double(),
## T2016 = col_double(),
## T2018 = col_double(),
## relatedness_coef = col_double()
## )
# Function to plot the networks
plot_edgelist <- function(w, t) {
# Conditional statements for the waves
if (w == 1) {
c <- "T2008"
title <- "Wave 1"
} else {
if(w ==2) {
c <- "T2010"
title <- "Wave 2"
} else {
if (w==3) {
c <- "T2012"
title <- "Wave 3"
} else {
if (w == 4) {
c <- "T2014"
title <- "Wave 4"
} else {
if (w == 5) {
c <- "T2016"
title <- "Wave 5"
} else {
c <- "T2018"
title <- "Wave 6"
}
}
}
}
}
# Take the wave
edgelist <- dolphin_edge_lists %>%
select(1,2, c, 9) %>%
filter(!is.na(.[,3]) & .[,3] > t) %>%
rename(weight = c,
from = ID1,
to = ID2)
net <- graph_from_data_frame(edgelist, directed = FALSE)
E(net)$color <- ifelse(edgelist$relatedness_coef >= mean(edgelist$relatedness_coef, na.rm = T),
"red", "grey")
plot(net, layout = layout.fruchterman.reingold,
vertex.label = "",
vertex.size = 3,
vertex.color = "white", edge.width = E(net)$weight,
main = paste0("Dolphins - ", title, "\n", "Threshold = ", t))
}
We are ready to start plotting. We have six networks here so I’m going to keep it short. I’ll plot them all using a threshold of 0.05.
walk(c(1:6), plot_edgelist, t = 0.05)
## Note: Using an external vector in selections is ambiguous.
## ℹ Use `all_of(c)` instead of `c` to silence this message.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
In most waves, we notice one main component consisting of individuals who are quite related to one another. Another subgraph of unrelated individuals at times consolidates, like in wave 2 and 3. However, there are times when we do not clearly see another connected clique, like in waves 4 or 6.
One way of looking at a network’s capacity for information transmission is by looking at its efficiency. Latora and Marchiori define the efficiency in communication between to nodes i and j as inversely proportional to the distance between them.
\[ Efficiency_{ij} = \frac{1}{d_{ij}}\] The average efficiency of a network G is the sum of all the inverse distances between each pair of nodes.
In this section, then, I will do the following:
I will choose two specific networks, one for elephants and one for dolphins.
I will calculate the global efficiency of the networks.
I will remove one node at a time - up to twenty - and will calculate a range of efficiency scores after each random removal.
I will work with wave 2 of both the elephant and dolphin data. Ties will only count if the edge weight is above or equal to the median edge weight in the association matrix. Let’s visualize the networks for clarity.
# Create the backboned elephant network
enet <- inv_w2
# Replace values below threshold with 0
enet[enet <= median(enet, na.rm = T)] <- 0
# Now create the network
elephant_graph <-
graph_from_adjacency_matrix(enet, mode = "undirected", weighted = T)
# Plot for clarity
plot_with_threshold(m = inv_w2,
t = round(mean(inv_w2, na.rm = T), 3),
c = "Elephants - Wave 2")
# Now create the dolphin network
# Take the wave 2 data
edgelist <- dolphin_edge_lists %>%
select(1,2, T2010, 9) %>%
filter(!is.na(T2010) & T2010 >= mean(T2010, na.rm = TRUE)) %>%
rename(weight = T2010,
from = ID1,
to = ID2)
dolphin_net <- graph_from_data_frame(edgelist, directed = FALSE)
average <- median(edgelist$weight, na.rm = T) %>% round(3)
plot_edgelist(w = 2,
t = average)
Now that we have the networks, we can begin to examine how random removal might affect the global efficiency.
# Write a function to explore random removal
efficiency_after_random_removal <- function(g, reps, num_removals) {
# placeholder
efficiencies <- rep(NA, reps)
for (i in 1:reps) {
rmv <- sample(1:vcount(g), num_removals)
ng <- delete.vertices(g,rmv)
eff <- efficiency(ng, type = "global", weights = E(ng)$weight)
efficiencies[i] <- eff
}
avg_eff <- mean(efficiencies)
sd_eff <- sd(efficiencies)
upper <- avg_eff + 1.96*sd_eff
lower <- avg_eff - 1.96*sd_eff
return(c(avg_eff = avg_eff,
upper = upper,
lower = lower,
removals = num_removals))
}
# Random removal dataframes
dolphin_rr_df <- map_df(c(1:20),
efficiency_after_random_removal,
g=dolphin_net,
reps = 100)
elephant_rr_df <- map_df(c(1:20),
efficiency_after_random_removal,
g=elephant_graph,
reps = 100)
# Add a column for the animal and then bind them
dolphin_rr_df$network <- "dolphin"
elephant_rr_df$network <- "elephant"
removal_df <- rbind(dolphin_rr_df, elephant_rr_df)
removal_df %>%
ggplot(aes(x = removals, y = avg_eff)) +
geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.2) +
geom_line(color = "gray") +
facet_wrap(~network) +
theme_bw() +
labs(title = "Global efficiency after random removal",
x = "Efficiency",
y = "Removals")
We notice that the efficiency of these networks seems quite robust to random removal, at least up to 20 nodes. We do notice that the variance in efficiencies expands quite a bit. It seems that removing certain nodes does have an effect but that - understandably - does not change the central tendency here.
I am going to do the same procedure but removing the most central nodes, one at a time. Maybe this will have a bigger effect.
# Write a function for removal of central nodes
efficiency_after_centrality_removal <- function(g, reps, num_removals) {
# placeholder
efficiencies <- rep(NA, reps)
for (i in 1:reps) {
centrality <- eigen_centrality(g)$vector
centrality <- sort(centrality, decreasing = T)
central_nodes <- names(centrality)
ng <- delete.vertices(g,central_nodes[1:num_removals])
eff <- efficiency(ng, type = "global", weights = E(ng)$weight)
efficiencies[i] <- eff
}
avg_eff <- mean(efficiencies)
sd_eff <- sd(efficiencies)
upper <- avg_eff + 1.96*sd_eff
lower <- avg_eff - 1.96*sd_eff
return(c(avg_eff = avg_eff,
upper = upper,
lower = lower,
removals = num_removals))
}
# Random removal dataframes
dolphin_rr_df <- map_df(c(1:20),
efficiency_after_centrality_removal,
g=dolphin_net,
reps = 100)
elephant_rr_df <- map_df(c(1:20),
efficiency_after_centrality_removal,
g=elephant_graph,
reps = 100)
# Add a column for the animal and then bind them
dolphin_rr_df$network <- "dolphin"
elephant_rr_df$network <- "elephant"
removal_df <- rbind(dolphin_rr_df, elephant_rr_df)
removal_df %>%
ggplot(aes(x = removals, y = avg_eff)) +
geom_line(color = "gray") +
geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.2) +
facet_wrap(~network) +
theme_bw() +
labs(title = "Global efficiency after centrality removal",
x = "Efficiency",
y = "Removals")
We do notice here a slight decrease in global efficiency.
I am going to examine now how diffusion trajectories might be affected by random removal. To examine diffusion, I will use a contagion model that allows for both simple and complex contagion.
# Contagion model from Acerbi et al (2020)
info_contagion <- function(net, rewire, e = 1, r_max, sim = 1){
# Rewire network if random is set to TRUE
if(rewire){
net <- rewire(graph = net, with = keeping_degseq(loops = F, niter = 10^3))
}
# Get adjacency matrix from network
adjm <- get.adjacency(net, sparse = F)
# Turn adjacency matrix into boolean (TRUE / FALSE) - if you dont want weights
# adjm_bool <- adjm > 0
# Set number of individuals based adjacency matrix
N <- vcount(net)
# Create a vector indicating possession of info and set one entry to TRUE
info <- rep(FALSE, N)
info[sample(x = N, size = 1)] <- TRUE
# Create a reporting variable
proportion <- rep(0, r_max)
# Rounds
for(r in 1:r_max){
# In random sequence go through all individuals without info
for(i in sample(N)){
# Select i's neighbourhood
nei <- adjm[i,] > 0
# If you dont want to include weights, quote above, unquote below
#nei <- adjm_bool[i,]
# Proceed if there is at least one neighbour
if(sum(nei) > 0){
# Simple contagion for e = 1 and complex contagion for e = 2
if(runif(n = 1, min = 0, max = 1) <= (sum(info * nei) / length(nei))^e){
info[i] <- TRUE
}
}
}
# Record proportion of the population with info
proportion[r] <- sum(info) / N
# Increment the round counter
r <- r + 1
}
# Return a tibble with simulation results
return(tibble(time = 1:r_max,
proportion = proportion,
time_to_max = which(proportion == max(proportion))[1],
e = e,
network = ifelse(test = rewire, yes = "random", no = "model output"),
sim = sim))
}
What I am going to do here is: 1) Run contagion model. 2) Run contagion model with node removal. 3) Calculate areas under the curve for both diffusion trajectories. 4) Find the ratio between these areas. 5) Repeat 1 through 4 so we get a distribution of differences of areas under the curve.
Let’s see how that would look like in the elephant network:
set.seed(33)
# Contagion with no removal
e_sim_nr <- info_contagion(elephant_graph,
rewire = F,
e = 1,
r_max = 100)
p1 <- e_sim_nr %>%
ggplot(aes(x = time, y = proportion)) +
geom_line() +
theme_bw() +
labs(title = "No removal")
# Now some random removal
rmv <- sample(1:vcount(elephant_graph), 10)
ng <- delete.vertices(elephant_graph, rmv)
# Contagion with no removal
e_sim_r <- info_contagion(ng,
rewire = F,
e = 1,
r_max = 100)
p2 <- e_sim_r %>%
ggplot(aes(x = time, y = proportion)) +
geom_line() +
theme_bw() +
labs(title = "50 nodes removed")
p1 + p2
We have very similar trajectories. What happens when we divide the areas under the curve.
auc_nr <- area_under_curve(x = e_sim_nr$time,
y = e_sim_nr$proportion)
auc_r10 <- area_under_curve(x = e_sim_r$time,
y = e_sim_r$proportion)
auc_r10/auc_nr
## [1] 0.9880085
We notice that the area under the curve after removal is slightly smaller after removal. This means that the trajectory reached the maximum a bit quicklier. I will now repeat this process many times to get a distribution of differences.
# Function to calculate the AUC of multiple runs
multiple_diff_auc <- function(ev, g, reps, turns, num_removals) {
# Place holder
values <- rep(NA, reps)
for (i in 1:reps) {
sim <- info_contagion(g,
rewire = F,
e = ev,
r_max = turns)
auc_nr <- area_under_curve(x = sim$time,
y = sim$proportion)
# Now some random removal
rmv <- sample(1:vcount(g), num_removals)
ng <- delete.vertices(g, rmv)
sim_r <- info_contagion(ng,
rewire = F,
e = ev,
r_max = turns)
auc_r <- area_under_curve(x = sim_r$time,
y = sim_r$proportion)
diff <- auc_r/auc_nr
values[i] <- diff
}
return(values)
}
# Now run 100
diffs_10_rems <- multiple_diff_auc(ev = 1,
g = elephant_graph,
reps = 100,
turns = 100,
num_removals = 10)
diffs_20_rems <- multiple_diff_auc(ev = 1,
g = elephant_graph,
reps = 100,
turns = 100,
num_removals = 20)
diffs_5_rems <- multiple_diff_auc(ev = 1,
g = elephant_graph,
reps = 100,
turns = 100,
num_removals = 5)
aucs_elephant <- tibble(ratios = c(diffs_5_rems, diffs_10_rems, diffs_20_rems),
removals = rep(c(5, 10, 20), each = 100)) %>%
mutate(removals = as.factor(removals))
aucs_elephant[-213,] %>%
ggplot(aes(x = ratios, fill = removals)) +
geom_density(alpha = 0.3) +
geom_vline(xintercept = 1, linetype = 3) +
theme_bw() +
labs(title = "Ratios of areas under the curve",
subtitle = "Elephant data")
Most of the ratios fall squarely around 1. It seems then that random removals - at least for these numbers - do not change trajectories of diffusion much.
Let’s look at the dolphin data.
# Now run 100
diffs_10_rems <- multiple_diff_auc(ev = 1,
g = dolphin_net,
reps = 100,
turns = 100,
num_removals = 10)
diffs_20_rems <- multiple_diff_auc(ev = 1,
g = dolphin_net,
reps = 100,
turns = 100,
num_removals = 20)
diffs_5_rems <- multiple_diff_auc(ev = 1,
g = dolphin_net,
reps = 100,
turns = 100,
num_removals = 5)
aucs_dolphin <- tibble(ratios = c(diffs_5_rems, diffs_10_rems, diffs_20_rems),
removals = rep(c(5, 10, 20), each = 100)) %>%
mutate(removals = as.factor(removals))
It seems we have very large ratios at times. This might be because we remove isolates or unconnected individuals, making transmission more efficient.
aucs_dolphin %>%
arrange(desc(ratios)) %>%
slice(1:10)
## # A tibble: 10 x 2
## ratios removals
## <dbl> <fct>
## 1 76.7 10
## 2 75.5 10
## 3 60.2 10
## 4 8.09 10
## 5 2.55 10
## 6 1.94 10
## 7 1.76 5
## 8 1.64 10
## 9 1.59 10
## 10 1.55 5
We have some pretty big numbers. I’m going to remove those for plotting but they are worth keeping in mind.
aucs_dolphin %>%
filter(ratios <= 40) %>%
ggplot(aes(x = ratios, fill = removals)) +
geom_density(alpha = 0.3) +
geom_vline(xintercept = 1, linetype = 3) +
theme_bw() +
labs(title = "Ratios of areas under the curve",
subtitle = "Dolphin data")
Again, the differences here are centered around one but we notice really long tails with quite a lot of variance. We notice there are considerably more values below one, indicating removal led to more sluggish trajectories of transmission. But we cannot discern a clear pattern here.
This is an initial overview. I think the some next steps to follow could be the following:
Look at diffusion trajectories when we impose higher thresholds on what constitutes a tie.
Look at diffusion trajectories on the networks for different waves.
Incorporate more principled removal procedures.
Incorporate dynamic networks into our diffusion simulations